open-source model
Three reasons why DeepSeek's new model matters
The long-awaited V4 is more efficient and a win for Chinese chipmakers. On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model. Notably, the model can process much longer prompts than its last generation, thanks to a new design that helps it handle large amounts of text more efficiently. Like DeepSeek's previous models, V4 is open source, meaning it is available for anyone to download, use, and modify. V4 marks DeepSeek's most significant release since R1, the reasoning model it launched in January 2025. R1, which was trained on limited computing resources, stunned the global AI industry with its strong performance and efficiency, turning DeepSeek from a little-known research team into China's best-known AI company almost overnight.
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an overly optimistic measure of progress. We demonstrate that although open-source models can appear to outperform strong proprietary models on these benchmarks, a simple stress test with slightly different charts or questions deteriorates performance by up to 34.5%. In this work, we propose CharXiv, a comprehensive evaluation suite involving 2,323 natural, challenging, and diverse charts from scientific papers. CharXiv includes two types of questions: 1) descriptive questions about examining basic chart elements and 2) reasoning questions that require synthesizing information across complex visual elements in the chart. To ensure quality, all charts and questions are handpicked, curated, and verified by human experts. Our results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model (i.e., GPT-4o), which achieves 47.1% accuracy, and the strongest open-source model (i.e., InternVL Chat V1.5), which achieves 29.2%. All models lag far behind human performance of 80.5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs. We hope that CharXiv facilitates future research on MLLM chart understanding by providing a more realistic and faithful measure of progress.
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the performance of code LLMs with a particular focus on code generation tasks. However, they are insufficient to cover the full range of expected capabilities of code LLMs, which span beyond code generation to answering diverse coding-related questions.
What's next for Chinese open-source AI
Chinese open models are spreading fast, from Hugging Face to Silicon Valley. In this photo illustration, the DeepSeek apps is seen on a phone in front of a flag of China on January 28, 2025 in Hong Kong, China. The past year has marked a turning point for Chinese AI. Since DeepSeek released its R1 reasoning model in January 2025, Chinese companies have repeatedly delivered AI models that match the performance of leading Western models at a fraction of the cost. Just last week the Chinese firm Moonshot AI released its latest open-weight model, Kimi K2.5, which came close to top proprietary systems such as Anthropic's Claude Opus on some early benchmarks. The difference: K2.5 is roughly one-seventh Opus's price.
Signal's Founder Built a Chatbot That Can't Spy on You
Signal's Founder Built a Chatbot That Can't Spy on You Welcome back to, TIME's new twice-weekly newsletter about AI. If you're reading this in your browser, why not subscribe to have the next one delivered straight to your inbox? What to Know: Signal's founder is working on encrypted chatbots Moxie Marlinspike, the cryptographic prodigy who wrote the code that underpins Signal and WhatsApp, has a new project--and it could be one of the most important things happening in AI right now. The tool, named Confer, is an end-to-end encrypted AI assistant. It uses smart math to ensure that even though the compute-intensive process of running the AI still happens on a server in the cloud, the only person who can access the unscrambled details of that computation is you, the user.
What's next for AI in 2026
Our AI writers make their big bets for the coming year--here are five hot trends to watch. In an industry in constant flux, sticking your neck out to predict what's coming next may seem reckless. But for the last few years we've done just that--and we're doing it again. How did we do last time? Here are our big bets for the next 12 months. The last year shaped up as a big one for Chinese open-source models.
Five AI Developments That Changed Everything This Year
President Donald Trump speaks in the Roosevelt Room flanked by Masayoshi Son, Larry Ellison, and Sam Altman at the White House on January 21, 2025. President Donald Trump speaks in the Roosevelt Room flanked by Masayoshi Son, Larry Ellison, and Sam Altman at the White House on January 21, 2025. In case you missed it, 2025 was a big year for AI. It became an economic force, propping up the stock market, and a geopolitical pawn, redrawing the frontlines of Great Power competition. It had both global and deeply personal effects, changing the ways that we think, write, and relate.